how does the forum work

this is just an excuse for me to do a bit of nerdy rambling on how the entire thing works since i think its slightly different to other implementations.

In the spirit of rappad's blogs ive decided to make a simple blog. before i start id like to say that this is NOT amir. i am not amir, i am not behind the actual rappad site.

If you cant tell im moreso a technical person more than a music person. ive joined rp since 2018 and i got into a lot of drama (anyone can attest to that) mostly to dick around; a lot of my 'battle verses' are ad hominem armed with a rhyme scheme.

..with that out of the way, lets start with a general diagram of how the new 'forum' works:

General stack

Generally speaking the forum's backend is in golang, with much of the main forum experience being built basically with the usual golang+htmx stack. clientside stuff is written in vanilla JS, because i cant be fucked adding anything extra, plus i think the simple stuff doesnt really warrant adding things like nextjs. God golang is beautiful. Fun fact, the first iteration of the forum I wrote in 2023 when I was 15ish or so was in PHP and was actually pretty secure

So broadly the tech stack is:

  1. Go, HTMX
  2. Envoy, Cloudflare, Cloudflare R2, Workers
  3. Postgres, "Mesa" (for caching)
  4. Docker
  5. PM2

Now on to some random shit:

Envoy module

I wrote a few modules to help deter any basic passive recon/enumeration. Gobuster, for instance, tries to check whether or not a site is enumerable by (oddly enough) sending a few GETs with UUIDs, before bombing the site with a wordlist. To counter this and confuse an attacker, We can simply detect whether or not someone is GETting /uuid here, respond with a 200, strip any X-Forwarded-For or X-Remote-IP header, get cf-connecting-ip, track it, and from then on only respond to their queries with 200 OKs, which completely confuse Gobuster.

RapPad interaction

To talk to RapPad's API is simple. I'll just show you one endpoint for this example:

/api/users/:userid - this returns something like this:

{
  "user": {
    "id": 1234,
    "username": null,
    "score": 0,
    "picture": null,
    "raps": [
      {
        "id": 1860,
        "title": "love for 187 SG Mobz",
        "created_at": "2013-09-06T14:14:36.000Z",
        "visibility": "private"
      }
    ]
  }
}

Notice something interesting? This returns private and unlisted raps aswell. This probably isn't a major concern in my opinion because there is literally no one going to weaponize this nowadays on RapPad, but it is what I use to make this happen:

Everything else (e.g. the Notifier bot) opertes on a similar principle, by just observing and reverse-engineering API endpoints. Obviously this barely covers all of the endpoints, but you get the idea.

Hiding the Origin

I don't like revealing the Origin IP of my box, and I certainly don't want to be blocked by CF. To solve this I just wrote a simple worker script and ran it on Cloudflare's network as a worker to essentially fetch responses for me. I can do then do my usual HTML attribute matching on Go and we can do our serverside logic without accidentally exposing the origin, and being listed on Censys or Netlas.

The Bot

The Notifier Bot is just the application of the two things I mentioned above. In order to talk to the backend, I wrote a simple proto for GRPC between the backend and the bot, then I put MTLS between both sides just in case.

Username syncing

This is a bit fucky. To do this, Usernames are always locked until they are synchronized, which then will lead to them being assigned the new username, with the old one released. This is obviously a major race condition critical section spot so I placed a lock just in case. The same thing can be said for pretty much every flow regarding talking to the RapPad API, including signing up.

Storage and Caching

For data storage, I run a simple Postgres DB in Docker. For caching, I didn't want to use Redis since a) there's no need to use many of the feats Redis offers and b) im not really sure

I decided to write a lightweight, more specific solution, since I ideally just want to cache search results. Mesa is the answer. All Mesa is is a featherweight and speedy Go-based sharded in memory KV store for lock-free read/writes, and LRU eviction on old data. I'll probably release this soon since I found it pretty handy. Obviously to conserve size I don't just store objects of stuff in mesa, instead I have an arena of strings (i.e. when referencing a string we just reference it by the arena plus some offset n we store in the serialized format here) and use flatbufs to do binary serialization so Mesa can essentially be extremely performant and efficient at storing a bunch of threads/pages/etc.

// e.g.
type User struct {
   username string
   id       int
   // ...
}

store, err := mesa.New[User](mesa.Options{
    Shards:       128,
    Capacity:     10000,
    DefaultTTL:   5 * time.Minute,
    JanitorEvery: 30 * time.Second,
})
// ...

store.Set("k", User{username: "John", id: 1})
v, flag := store.Get("k")
// ...

Profile pictures and attachments are saved to Cloudflare R2 object storage.

Authentication

To prove that someone owns e.g. @Johnny, the forum computes the HMAC of the username Johnny and tell the user to slap it back on their bio, that way I don't have to actually store stuff like emails and people can sign up trivially. Then to verify we simply use a worker proxy and fetch the data and check for a match within the specific bio class attrib, recalculate the HMAC to see if it matches, so the entire thing is stateless. Again, very dirty but simple trick. I also added a specific string I_acknowledge_this_is_for_rappad.forum_verify-[HMAC] so people can't just social engineer each other to paste it and then impersonate each other.

Ratelimiting and Proof-of-Work

I don't like using captchas for this site since it probably turns people off. To solve this I just used a typical hashcash-styled POW. The whole idea is that we give a simple challenge string which is generated by SHA256("foobar"+nonce), give the user the resultant hash and the nonce, and challenge them to recalculate a hash such that it starts with a certain number of zeroes. This takes less than a few seconds for most machines but it is enough to deter most attackers since in accumulation it'll burn through compute when botting. Anything that is 'heavy' e.g. looking up a user thru RapPad, talking to the notifier bot thru GRPC to do something major, etc. etc. is guarded with a simple ratelimiting system thats basically just exponential backoff. and obviously, it ratelimits by the user ID.

Frontend

Literally raw HTML and JS with SASS.

Admin tooling

Currently for me to administrate I have to ssh in the box and run go ./cmd/admin. I can pin/unpin threads, delete accounts, delete threads, reset passwords, create an account (which automatically claims a RapPad account aswell), automatically update everyone's usernames based on their actual RapPad UID and list all users.

What does the DB look like?

I'll try to reveal as little as possible here but this is what the average account looks like at the moment:

id BIGSERIAL PRIMARY KEY,
uhash TEXT NOT NULL UNIQUE,
phash TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT now()

... where uhash is a SHA512 hash, and phash are argon2 hashes. Anyone that dumps the DB won't really know much.

Canary system

To prevent other classes of vulns that most people may try, I've intentionally tripwired a small subset of them to fake responses, which will flag the person involved. For example, doing a really specific attack I might've expected at a very specific endpoint may trigger fake behavior from the server. If you find a vulnerability, please confirm it's actually exploitable beyond any reasonable doubt before relaying it to me.

Starvation

The core principle on securing the forum is to reveal as little information as possible about the server (which is why a lot of stuff is essentially layered away). Ideally any stack trace leak or information disclosure that reveals even a slight bit is considered a blunder in my opinion.